1 Exercise 01

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

  1. Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.
## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

  1. Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.
auto_interval <- fc %>% 
    hilo() %>% 
    unpack_hilo("95%") %>% 
    select(4:7) %>% 
    slice(1)

auto_interval
## # A tsibble: 1 x 5 [1M]
##    .mean                  `80%` `95%_lower` `95%_upper`    Month
##    <dbl>                 <hilo>       <dbl>       <dbl>    <mth>
## 1 95187. [83200.06, 107173.1]80      76855.     113518. 2019 sij

Let’s get manually those values:

sd_res <- fit %>% 
    augment() %>% 
    pull(.resid) %>% 
    sd()

auto_interval$.mean[1] + c(-1, 1) * (qnorm(0.975) * sd_res) 
## [1]  76871.35 113501.77

Well, almost close …

2 Exercise 02

Write your own function to implement simple exponential smoothing. The function should take arguments y (the time series), alpha (the smoothing parameter \(\alpha\)) and level (the initial level \(\ell_0\)). It should return the forecast of the next observation in the series. Does it give the same forecast as ETS()?

I’ve got an inspiration from this source. We are both getting the same values for this sub-exercise, but for the next one, this function yields better values.

exp_smoothing_manual <- function(y, arg_alpha, arg_ell_zero, bool_forecast_h1 = FALSE) {
    
    if (bool_forecast_h1) {
        total_len <- length(y) + 1
    } else {
        total_len <- length(y)
    }
    
    
    anti_alpha <- (1 - arg_alpha)
    
    store_predictions <- rep(NA, total_len)
    store_predictions[1] <- arg_ell_zero
    
    for (i in seq_along(store_predictions)) {
        
        if (i == 1) {
            last_y <- store_predictions[i]
            last_yhat <- store_predictions[i]
        } else {
            last_y <- y[i - 1]
            last_yhat <- store_predictions[i - 1]
        }
        
        term_01 <- arg_alpha * last_y
        
        term_02 <- anti_alpha * last_yhat
        
        yhat <- term_01 + term_02
        
        store_predictions[i] <- yhat
        
        
    }
    
    out <- yhat[length(yhat)]
    
    return(out)
    
}

fit_model_pars <- coef(fit) %>% 
    select(term, estimate)

value_manual <- exp_smoothing_manual(
    y = aus_pigs$Count,
    arg_alpha = fit_model_pars$estimate[fit_model_pars$term == "alpha"],
    arg_ell_zero = fit_model_pars$estimate[fit_model_pars$term == "l[0]"],
    TRUE
)

value_auto <- fc$.mean[1]

value_manual == value_auto
## [1] TRUE

Great!

3 Exercise 03

Modify your function from the previous exercise to return the sum of squared errors rather than the forecast of the next observation. Then use the optim() function to find the optimal values of \(\alpha\) and \(\ell_0\). Do you get the same values as the ETS() function?

Note: this source got value of 0.299 for alpha and 76379.265 for level. But, the code below is heavily modified and doesn’t get the values of the source. Anyways, correct values are:

true_values <- coef(fit) %>% 
    select(term, estimate)

true_values
## # A tibble: 2 × 2
##   term    estimate
##   <chr>      <dbl>
## 1 alpha      0.322
## 2 l[0]  100647.

Let’s try to get optimal values:

optimize_exp_smoothing <- function(pars = c(arg_alpha, arg_ell_zero), y) {
    
    out_predicted <- rep(NA, length(y))
    
    for (i in seq_along(out_predicted)) {
        
        out_predicted[i] <- exp_smoothing_manual(
            y = y[1:i],
            arg_alpha = pars[1],
            arg_ell_zero = pars[2]
        )
        
    }
    
    squared_errors <- (out_predicted - y) ^ 2
    out <- sum(squared_errors) %>% sqrt()
    return(out)
    
}

optim_pigs <- optim(
    par = c(0.5, aus_pigs$Count[1]),
    y = aus_pigs$Count,
    fn = optimize_exp_smoothing,
    method = "Nelder-Mead"
)

true_values %>% 
    mutate(my_values = optim_pigs$par,
           pct_diff = (my_values / estimate) - 1)
## # A tibble: 2 × 4
##   term    estimate my_values   pct_diff
##   <chr>      <dbl>     <dbl>      <dbl>
## 1 alpha      0.322     0.322 -0.0000323
## 2 l[0]  100647.    99223.    -0.0141

Very small differences. For \(\alpha\), I am practically getting the correct value, while for \(\ell_{0}\) (starting value), I missed the mark by 1.41%.

4 Exercise 04

Combine your previous two functions to produce a function that both finds the optimal values of \(\alpha\) and \(\ell_{0}\), and produces a forecast of the next observation in the series.

exp_smooth_combine <- function(time_series) {
    
    optim_series <- optim(
        par = c(0.5, time_series[1]),
        y = time_series,
        fn = optimize_exp_smoothing,
        method = "Nelder-Mead"
    )
    
    out <- exp_smoothing_manual(
        y = time_series,
        arg_alpha = optim_series$par[1],
        arg_ell_zero = optim_series$par[2],
        TRUE
    )
    
    return(out)
    
}

var_forecast <-  exp_smooth_combine(aus_pigs$Count)
var_forecast
## [1] 95186.57

Is this same as forecasted value from fable?

## My Value: 95,186.57 | model value: 95,186.56

5 Exercise 05

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

  1. Plot the Exports series and discuss the main features of the data.
china_exports <- global_economy %>% 
    filter(Country == "China") %>% 
    select(Year, Exports)

china_exports %>% 
    autoplot(Exports) +
    labs(title = "China Exports") +
    scale_x_continuous(n.breaks = 20)

We can see that the China’s exports of goods and services (% of GDP) has positive trend until 2008. After 2008, massive drop is visible. In the context of time series, there is no seasonality present.

  1. Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

  1. Compute the RMSE values for the training data.
## [1] 1.900172
  1. Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

The trended model (right graph) shows stable and neutral level of exports during the forecast horizon. On the other hand, trended exponential smoothing model shows downward future trajectory. Prediction intervals for both methods show that there is considerable uncertainty in the future exports over the five-year forecast period.

Regarding what actually happened in 2018 and forward, this link is helpful and shows that for this case study, model on the right graph would have been more appropriate.

  1. Compare the forecasts from both methods. Which do you think is best?
## # A tibble: 2 × 10
##   .model                .type      ME  RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
##   <chr>                 <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "ETS(Exports ~ error… Trai…  0.266   1.90  1.26  1.84   9.34 0.983 0.991 0.288
## 2 "ETS(Exports ~ error… Trai… -0.0854  1.91  1.25 -0.169  9.57 0.973 0.995 0.232

ETS(A, N, N) model is more accurate on training data, but only slightly: * RMSE: 1.90 vs. 1.91. * MAPE: 9.43 vs. 9.57.

  1. Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

For the solution, see the code snippet in Exercise 01, b.

6 Exercise 06

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

china_fit %>% 
    accuracy() %>% 
    arrange(MAPE)
## # A tibble: 4 × 10
##   .model            .type       ME    RMSE     MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>             <chr>    <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "ETS(box_cox(GDP… Trai… -2.75e10 2.88e11 1.25e11 0.607  7.17 0.578 0.688 0.665
## 2 "ETS(box_cox(GDP… Trai…  6.35e 9 1.96e11 1.02e11 1.77   7.26 0.472 0.468 0.135
## 3 "ETS(box_cox(GDP… Trai… -6.22e10 3.08e11 1.32e11 0.118  7.36 0.611 0.735 0.670
## 4 "ETS(box_cox(GDP… Trai…  2.10e11 4.16e11 2.13e11 8.37  10.8  0.983 0.991 0.790

7 Exercise 07

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

The multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series, which is precisely the case here.

Making the trend damped improves the forecast!

8 Exercise 08

Recall your retail time series data (from Exercise 8 in Section 2.10).

  1. Why is multiplicative seasonality necessary for this series?

The multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series, which is precisely the case here.

  1. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

  1. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
## # A tibble: 2 × 12
##   State       Indus…¹ .model .type       ME  RMSE   MAE    MPE  MAPE  MASE RMSSE
##   <chr>       <chr>   <chr>  <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1 Northern T… Clothi… "ETS(… Trai… -0.00119 0.600 0.443 -0.265  5.21 0.506 0.517
## 2 Northern T… Clothi… "ETS(… Trai…  0.0582  0.602 0.449  0.499  5.23 0.512 0.520
## # … with 1 more variable: ACF1 <dbl>, and abbreviated variable name ¹​Industry
## # ℹ Use `colnames()` to see all variable names
  1. Check that the residuals from the best method look like white noise.

## # A tibble: 2 × 5
##   State              Industry                             .model lb_stat lb_pv…¹
##   <chr>              <chr>                                <chr>    <dbl>   <dbl>
## 1 Northern Territory Clothing, footwear and personal acc… "ETS(…   0.663   0.415
## 2 Northern Territory Clothing, footwear and personal acc… "ETS(…   0.894   0.344
## # … with abbreviated variable name ¹​lb_pvalue

Ljung-Box test shows significant p-value, meaning that we cannot reject \(H_{0}\) which tells us that the data is independently distributed (= white noise).

  1. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?
## # A tibble: 2 × 12
##   .model   State Indus…¹ .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr> <chr>   <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "ETS(Tu… Nort… Clothi… Test   0.504  1.38  1.03   2.68  7.42   NaN   NaN 0.599
## 2 "NAIVE(… Nort… Clothi… Test  -4.66   5.39  5.00 -40.4  42.1    NaN   NaN 0.262
## # … with abbreviated variable name ¹​Industry

The ETS > NAIVE, quite clearly.

9 Exercise 09

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

Let’s do this!

optimal_lambda <- guerrero(myseries$Turnover) %>% unname()

myseries_boxcox <- myseries %>% 
    mutate(Turnover = box_cox(Turnover, optimal_lambda))


myseries_boxcox_dcmp <- myseries_boxcox %>% 
    model(STL(Turnover ~ trend(window = 12), robust = TRUE)) %>% 
    components() %>% 
    select(-.model)

myseries_boxcox_dcmp_tt_split <- time_series_split(myseries_boxcox_dcmp, nrow(myseries_boxcox_dcmp %>% filter(year(Month) <= 2010)))

myseries_boxcox_dcmp %>% 
    mutate(Category = ifelse(Month %in% myseries_boxcox_dcmp_tt_split$train$Month, "Train", "Test")) %>% 
    ggplot(aes(x = Month, y = season_adjust, color = Category)) +
    geom_line() +
    labs(title = "Seasonally adjusted data") +
    scale_color_manual(values = c("red", "black"))

myseries_boxcox_dcmp_fit_train <- myseries_boxcox_dcmp_tt_split$train %>% 
    model(ETS(season_adjust ~ error("A") + trend("A") + season("N")))


myseries_boxcox_dcmp_fc_test <- myseries_boxcox_dcmp_fit_train %>% 
    forecast(new_data = myseries_boxcox_dcmp_tt_split$test)

check_forecast_with_split_data(
    object_fit = myseries_boxcox_dcmp_fit_train,
    object_fc = myseries_boxcox_dcmp_fc_test,
    object_data = myseries_boxcox_dcmp,
    object_test = myseries_boxcox_dcmp_tt_split$test
)

The results are even better!

10 Exercise 10

Compute the total domestic overnight trips across Australia from the tourism dataset.

  1. Plot the data and describe the main features of the series.

##         trend_strength seasonal_strength_year     seasonal_peak_year 
##              "0.94579"              "0.74700"              "1.00000" 
##   seasonal_trough_year              spikiness              linearity 
##              "3.00000"     "12,063,155.58598"          "9,641.45055" 
##              curvature             stl_e_acf1            stl_e_acf10 
##         "11,857.46735"             "-0.43415"              "0.40279" 
##                   acf1                  acf10             diff1_acf1 
##              "0.72174"              "2.57513"             "-0.22286" 
##            diff1_acf10             diff2_acf1            diff2_acf10 
##              "1.19018"             "-0.39996"              "0.93099" 
##            season_acf1                  pacf5            diff1_pacf5 
##              "0.68044"              "0.82441"              "0.61493" 
##            diff2_pacf5            season_pacf          zero_run_mean 
##              "0.88030"              "0.24853"              "0.00000" 
##     nonzero_squared_cv        zero_start_prop          zero_end_prop 
##              "0.01010"              "0.00000"              "0.00000" 
##        lambda_guerrero              kpss_stat            kpss_pvalue 
##              "1.99993"              "0.84674"              "0.01000" 
##                pp_stat              pp_pvalue                 ndiffs 
##             "-1.83097"              "0.10000"              "2.00000" 
##                nsdiffs                bp_stat              bp_pvalue 
##              "1.00000"             "41.67324"              "0.00000" 
##                lb_stat              lb_pvalue          var_tiled_var 
##             "43.25577"              "0.00000"              "0.05751" 
##         var_tiled_mean        shift_level_max      shift_level_index 
##              "0.84005"          "2,415.32131"              "2.00000" 
##          shift_var_max        shift_var_index           shift_kl_max 
##      "3,735,701.35753"             "45.00000"              "2.23865" 
##         shift_kl_index       spectral_entropy      n_crossing_points 
##             "75.00000"              "0.49398"             "26.00000" 
##      longest_flat_spot             coef_hurst           stat_arch_lm 
##              "3.00000"              "0.97717"              "0.83773"

  1. Decompose the series using STL and obtain the seasonally adjusted data.

  1. Forecast the next two years of the series using an additive damped trend method applied to the seasonally adjusted data. (This can be specified using decomposition_model().)
tourism_australia_dcmp_fit <- tourism_australia_dcmp %>% 
    model(ETS(season_adjust ~ error("A") + trend("Ad") + season("N")),
          ETS(season_adjust ~ error("A") + trend("A") + season("N")))

The c. exercise will be solved in d. part below.

  1. Forecast the next two years of the series using an appropriate model for Holt’s linear method applied to the seasonally adjusted data (as before but without damped trend).

  1. Now use ETS() to choose a seasonal model for the data.

  1. Compare the RMSE of the ETS model with the RMSE of the models you obtained using STL decompositions. Which gives the better in-sample fits?
## # A tibble: 3 × 3
##   .model                                                              RMSE  MAPE
##   <chr>                                                              <dbl> <dbl>
## 1 "ETS(season_adjust ~ error(\"A\") + trend(\"A\") + season(\"N\"))"  789.  2.82
## 2 "ETS(season_adjust ~ error(\"A\") + trend(\"Ad\") + season(\"N\")…  790.  2.83
## 3 "ETS(Trips ~ error(\"A\") + trend(\"A\") + season(\"A\"))"          794.  2.86

Holt’s linear method is most precise for in-sample fits (training data).

  1. Compare the forecasts from the three approaches? Which seems most reasonable?

Honestly, the model that has the worst perfomance seems to give most reasonable forecast, since some seasonality is present in the original data.

  1. Check the residuals of your preferred model.

Everything seems fine.

11 Exercise 11

12 Exercise 12

13 Exercise 13

14 Exercise 14

15 Exercise 15

16 Exercise 16

17 Exercise 17